Interpretable Classification Models for Recidivism Prediction
نویسندگان
چکیده
We investigate a long-debated question, which is how to create predictive models of recidivism that are sufficiently accurate, transparent, and interpretable to use for decision-making. This question is complicated as these models are used to support different decisions, from sentencing, to determining release on probation, to allocating preventative social services. Each case might have an objective other than classification accuracy, such as a desired true positive rate (TPR) or false positive rate (FPR). Each (TPR, FPR) pair is a point on the receiver operator characteristic (ROC) curve. We use popular machine learning methods to create models along the full ROC curve on a wide range of recidivism prediction problems. We show that many methods (SVM, SGB, Ridge Regression) produce equally accurate models along the full ROC curve. However, methods that designed for interpretability (CART, C5.0) cannot be tuned to produce models that are accurate and/or interpretable. To handle this shortcoming, we use a recent method called Supersparse Linear Integer Models (SLIM) to produce accurate, transparent, and interpretable scoring systems along the full ROC curve. These scoring systems can be used for decision-making for many different use cases, since they are just as accurate as the most powerful black-box machine learning models for many applications, but completely transparent, and highly interpretable.
منابع مشابه
A Decision Tree Approach to Predicting Recidivism in Domestic Violence
Domestic violence (DV) is a global social and public health issue that is highly gendered. Being able to accurately predict DV recidivism, i.e., re-offending of a previously convicted offender, can speed up and improve risk assessment procedures for police and front-line agencies, better protect victims of DV, and potentially prevent future reoccurrences of DV. Previous work in DV recidivism ha...
متن کاملPrediction of melting points of a diverse chemical set using fuzzy regression tree
The classification and regression trees (CART) possess the advantage of being able to handlelarge data sets and yield readily interpretable models. In spite to these advantages, they are alsorecognized as highly unstable classifiers with respect to minor perturbations in the training data.In the other words methods present high variance. Fuzzy logic brings in an improvement in theseaspects due ...
متن کاملS3PSO: Students’ Performance Prediction Based on Particle Swarm Optimization
Nowadays, new methods are required to take advantage of the rich and extensive gold mine of data given the vast content of data particularly created by educational systems. Data mining algorithms have been used in educational systems especially e-learning systems due to the broad usage of these systems. Providing a model to predict final student results in educational course is a reason for usi...
متن کاملA NOTE TO INTERPRETABLE FUZZY MODELS AND THEIR LEARNING
In this paper we turn the attention to a well developed theory of fuzzy/lin-guis-tic models that are interpretable and, moreover, can be learned from the data.We present four different situations demonstrating both interpretability as well as learning abilities of these models.
متن کاملA New High-order Takagi-Sugeno Fuzzy Model Based on Deformed Linear Models
Amongst possible choices for identifying complicated processes for prediction, simulation, and approximation applications, high-order Takagi-Sugeno (TS) fuzzy models are fitting tools. Although they can construct models with rather high complexity, they are not as interpretable as first-order TS fuzzy models. In this paper, we first propose to use Deformed Linear Models (DLMs) in consequence pa...
متن کامل